Space-Economical Construction of Index Structures for All Suffixes of a String
نویسندگان
چکیده
The minimum all-suffixes directed acyclic word graph (MASDAWG) of a string w has |w| + 1 initial nodes, where the dag induced by all reachable nodes from the k-th initial node conforms with the DAWG of the k-th suffix of w. A new space-economical algorithm for the construction of MASDAWG(w) is presented. The algorithm reads a given string w from right to left, and constructs MASDAWG(w) without suffix links. It performs in time linear in the output size. Furthermore, we introduce the minimum all-suffixes compact DAWG (MASCDAWG). CDAWGs are known to be more space-economical than DAWGs, and thus MASCDAWG(w) requires smaller space than MASDAWG(w). We present an on-line (right-to-left) algorithm to build MASCDAWG(w) without suffix links, whose running time is also linear in its size.
منابع مشابه
Suffix Array of Alignment: A Practical Index for Similar Data
The suffix tree of alignment is an index data structure for similar strings. Given an alignment of similar strings, it stores all suffixes of the alignment, called alignment-suffixes. An alignment-suffix represents one suffix of a string or suffixes of multiple strings starting at the same position in the alignment. The suffix tree of alignment makes good use of similarity in strings theoretica...
متن کاملDeterministic Sub-Linear Space LCE Data Structures With Efficient Construction
Given a string S of n symbols, a longest common extension query LCE(i, j) asks for the length of the longest common prefix of the ith and jth suffixes of S. LCE queries have several important applications in string processing, perhaps most notably to suffix sorting. Recently, Bille et al. (J. Discrete Algorithms 25:42–50, 2014, Proc. CPM 2015:65–76) described several data structures for answeri...
متن کاملSuffix Tree
SYNONYMS Compact suffix trie DEFINITION The suffix tree S(y) of a non-empty string y of length n is a compact trie representing all the suffixes of the string. The suffix tree of y is defined by the following properties: All branches of S(y) are labeled by all suffixes of y. • • Edges of S(y) are labeled by strings. • Internal nodes of S(y) have at least two children. • Edges outgoing an intern...
متن کاملSpace Efficient Linear Time Construction of Suffix Arrays
We present a linear time algorithm to sort all the suffixes of a string over a large alphabet of integers. The sorted order of suffixes of a string is also called suffix array, a data structure introduced by Manber and Myers that has numerous applications in pattern matching, string processing, and computational biology. Though the suffix tree of a string can be constructed in linear time and t...
متن کاملEfficient de novo assembly of large genomes using compressed data structures - Supplemental Materials and Methods
The suffix array is a compact representation of the lexicographic ordering of the suffixes of a text [1]. Each element of the array is an index into the original string; SAX [i] = j indicates that the suffix starting at position j in T is the i-th lowest suffix in X. As an example consider the string T = AGATCGATA$. The suffix array of T is SAT = [10, 9, 1, 7, 3, 5, 6, 2, 8, 4]. As the suffix a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002